Factorial Models for Noise Robust Speech Recognition
نویسندگان
چکیده
Noise compensation techniques for robust automatic speech recognition (ASR) attempt to improve system performance in the presence of acoustic interference. In feature-based noise compensation, which includes speech enhancement approaches, the acoustic features that are sent to the recognizer are first processed to remove the effects of noise (see Chapter 9). Model compensation approaches, in contrast, are concerned with modifying and even extending the acoustic model of speech to account for the effects of noise. A taxonomy of the different approaches to noise compensation is depicted in Figure 12.1, which serves as a road map for the present discussion. The two main strategies used for model compensation approaches are model adaptation and model-based noise compensation. Model adaptation approaches implicitly account for noise by adjusting the parameters of the acoustic model of speech, whereas model-based noise compensation approaches explicitly model the noise and its effect on the noisy speech features. Common adaptation approaches include maximum likelihood linear regression (MLLR) [56], maximum a posteriori (MAP) adaptation [32], and their generalizations [17, 29, 47]. These approaches, which are discussed in Chapter 11, alter the speech acoustic model in a completely data-driven way given additional training data or test data. Adaptation methods are somewhat more general than model-based approaches in that they may handle effects on the signal that are difficult to explicitly model, such as nonlinear distortion and changes in the voice in reaction to noise (the Lombard effect [53]). However, in the presence of additive noise, failing to take take into account the known interactions between speech and noise can be detrimental to performance. Model-based noise compensation approaches, in contrast to adaptation approaches, explicitly model the different factors present in the acoustic environment: the speech, the various sources of acoustic interference, and how they interact to form the noisy speech
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملModeling State-Conditional Observation Distribution using Weighted Stereo Samples for Factorial Speech Processing Models
This paper investigates the role of factorial speech processing models in noise-robust automatic speech recognition tasks. Factorial models can embed non-stationary noise models using Markov chains as one of its source chain. The paper proposes a modeling scheme for modeling state-conditional observation distribution of factorial models based on weighted stereo samples. This scheme is an extens...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملSpeech recognition using factorial hidden Markov models for separation in the feature space
This paper proposes an algorithm for the recognition and separation of speech signals in non-stationary noise, such as another speaker. We present a method to combine hidden Markov models (HMMs) trained for the speech and noise into a factorial HMM to model the mixture signal. Robustness is obtained by separating the speech and noise signals in a feature domain, which discards unnecessary infor...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملSpeech Enhancement Employing Variational Noise Model Composition for Robust Speech Recognition in Time-Varying Noisy Environments
This study proposes an effective noise estimation method for robust speech recognition in time-varying noise conditions. The proposed noise estimation scheme employs the Variation Model Composition (VMC) method, where multiple noise models are generated by selectively applying perturbation factors to the mean parameters of a basis noise model. The noise estimate is obtained by using the posteri...
متن کامل